Following BGC detection and annotation, all antiSMASH and GECCO BGCs were aggregated with experimentally validated BGCs from the Minimum Information about a Biosynthetic Gene cluster (MIBiG) database (MIBiG version 3.1). This resulted in a total set of 9059 BGCs, which were clustered into 1640 Gene Cluster Families (GCFs).
To explore and visualise the relationships between the identified GCFs, dimensionality reduction was performed using the graph-based Uniform Manifold Approximation and Projection (UMAP) tool.
Shows the relationship between clusterings at different resolutions.
The resolution was set to 0.5, resulting in 9 clusters.
Some of the clusters that seemed to overlap in the 2D UMAP are more separated in the 3D version (e.g. cluster 3).
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
|---|---|---|---|---|---|---|---|---|---|
| mixed | 182 | 49 | 48 | 20 | 43 | 0 | 5 | 14 | 1 |
| NRPS | 8 | 174 | 59 | 14 | 11 | 3 | 10 | 11 | 3 |
| PKS | 217 | 3 | 41 | 4 | 29 | 1 | 6 | 4 | 3 |
| RiPP | 2 | 3 | 10 | 122 | 14 | 51 | 5 | 5 | 3 |
| saccharide | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| terpene | 0 | 9 | 1 | 0 | 5 | 12 | 21 | 13 | 1 |
| unknown | 9 | 58 | 66 | 29 | 43 | 56 | 44 | 32 | 62 |
There is considerable overlap between the assigned Seurat clusters and the predicted biosynthetic class. This suggests that the biosynthetic class is a good parameter for cluster similarity.
| Var1 | Freq |
|---|---|
| antiSMASH | 8 |
| Gecco | 7 |
| MIBiG | 1607 |
| mixed | 18 |
| GCF | Number of BGCs | genomes that contain at least one BGC (%) |
|---|---|---|
| GCF0000070 | 143 | 15.50218 |
| GCF0000073 | 149 | 18.34061 |
| GCF0000075 | 110 | 21.17904 |
| GCF0000082 | 268 | 37.77293 |
| GCF0000092 | 264 | 28.38428 |
| GCF0000100 | 1009 | 100.00000 |
| GCF0000923 | 367 | 75.10917 |
| mixed | NRPS | PKS | RiPP | unknown | |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 | 0 | 0 |
| 2 | 1 | 1 | 0 | 1 | 2 |
| 3 | 2 | 1 | 0 | 6 | 0 |
| 4 | 0 | 0 | 1 | 0 | 1 |
| 5 | 0 | 0 | 0 | 12 | 2 |
| 6 | 0 | 0 | 0 | 0 | 0 |
| 7 | 0 | 0 | 1 | 0 | 0 |
| 8 | 0 | 0 | 0 | 1 | 0 |
Almost half of those (14/33) are in Cluster 5. The Cluster itself includes 123 GCF.
| GCF | cluster_length | class | GCF_method | GCF_rep | Number of BGCs |
|---|---|---|---|---|---|
| GCF0000086 | 10195 | RiPP | mixed | GCA_002157665_antiSMASH_BDOS01000001.1.region009 | 2 |
| GCF0000088 | 10204 | RiPP | mixed | GCA_002213005_antiSMASH_NBFC01000006.1.region001 | 366 |
| GCF0000089 | 5212 | RiPP | antiSMASH | GCA_000522725_antiSMASH_ALYW01000080.1.region001 | 1 |
| GCF0000093 | 6922 | unknown | Gecco | GCA_020531065_JAJBGA010000047.1_cluster_1 | 1 |
| GCF0000095 | 2294 | RiPP | antiSMASH | GCA_020529845_antiSMASH_JAJBDA010000106.region001 | 2 |
| GCF0000096 | 3151 | unknown | mixed | GCA_015668935_JADOZJ010000006.1_cluster_3 | 8 |
| GCF0000098 | 3959 | RiPP | antiSMASH | GCA_000339575_antiSMASH_AHTD01000072.1.region001 | 4 |
| GCF0000101 | 1397 | RiPP | antiSMASH | GCA_000340135_antiSMASH_AHTC01000107.1.region001 | 1 |
| GCF0000104 | 1823 | RiPP | antiSMASH | GCA_020529725_antiSMASH_JAJBCW010000149.region001 | 1 |
| GCF0000105 | 1221 | RiPP | mixed | GCA_033485055_JAVKRV010000007.1_cluster_1 | 9 |
| GCF0000107 | 10231 | RiPP | mixed | GCA_020529495_antiSMASH_JAJBCM010000002.region001 | 848 |
| GCF0000350 | 10228 | RiPP | mixed | GCA_012641545_antiSMASH_RXYY01000001.1.region003 | 30 |
| GCF0001639 | 2695 | RiPP | Gecco | GCA_002155285_CP021318.1_cluster_6 | 1 |
| GCF0001640 | 2276 | RiPP | antiSMASH | GCA_020530685_antiSMASH_JAJBED010000074.region001 | 1 |
-> orderable type strain
| cluster_id | gcf_id | class | method | GCF_method | |
|---|---|---|---|---|---|
| 2208 | GCA_019048645_antiSMASH_CP077404.1.region001 | GCF0000099 | NRPS | antiSMASH | mixed |
| 2209 | GCA_019048645_antiSMASH_CP077404.1.region003 | GCF0000091 | NRPS | antiSMASH | mixed |
| 2210 | GCA_019048645_antiSMASH_CP077404.1.region004 | GCF0000103 | NRPS | antiSMASH | mixed |
| 2211 | GCA_019048645_antiSMASH_CP077404.1.region005 | GCF0000107 | NRPS | antiSMASH | mixed |
| 2212 | GCA_019048645_antiSMASH_CP077404.1.region007 | GCF0000088 | NRPS | antiSMASH | mixed |
| 2213 | GCA_019048645_CP077404.1_cluster_1 | GCF0000099 | NRPS | gecco | mixed |
| 2214 | GCA_019048645_CP077404.1_cluster_3 | GCF0000091 | NRPS | gecco | mixed |
| 2215 | GCA_019048645_CP077404.1_cluster_4 | GCF0000103 | NRPS | gecco | mixed |
| 2216 | GCA_019048645_CP077404.1_cluster_5 | GCF0000107 | NRPS | gecco | mixed |
| 2217 | GCA_019048645_CP077404.1_cluster_8 | GCF0000088 | NRPS | gecco | mixed |
Number of BGCs in each GCF (novel GCF that includes at least one BGC from the genome GCA_019048645):
## [1] "GCF0000099 : 1079"
## [1] "GCF0000091 : 425"
## [1] "GCF0000103 : 832"
## [1] "GCF0000107 : 848"
## [1] "GCF0000088 : 366"